Hierarchical cluster language modeling with statistical rule extraction for rescoring n-best hypotheses during speech decoding

نویسندگان

Photina Jaeyun Jang

Alexander G. Hauptmann

چکیده

We propose an unsupervised learning algorithm that learns hierarchical patterns of word sequences in spoken language utterances. It extracts cluster rules from training data based on high n-gram probabilities to cluster words or segment a sentence. Cluster trees, similar to parse trees, are constructed from the learned cluster rules. Through hierarchical clustering we are adding grammatical structure onto the traditional trigram language model. The learned cluster rules are used to improve the n-best utterance hypothesis list which is output by the Sphinx III speech recognizer. Our hierarchical cluster language model is used to rescore and filter these n-best utterance hypotheses. It assigns confidence scores to segments of hypotheses that can be clustered hierarchically with the learned cluster rules. Rescoring the original n-best hypothesis list, which is based on acoustic and trigram language model scores, with our hierarchical cluster language model results in a set of hypotheses with lower word error rate. Our cluster language model was trained on TREC broadcast news data from 1995 and 1996, and tested on the HUB-4 ‘97 development test broadcast news data. Compared to manually created grammar rules, the cluster trees more accurately reflect the speech data since their cluster rules are automatically learned based on empirical n-gram probabilities from the training data, whereas manually written grammar rules can introduce human bias, and are expensive to develop. Prior symbolic knowledge in the form of rules can also be incorporated by simply applying the rules to the training data before the earliest applicable learning iteration. Our algorithm is also able to learn clusters reflecting various styles of data: whether the language is formal, strictly grammatical or loose conversational speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct word graph rescoring using a* search and RNNLM

The usage of Recurrent Neural Network Language Models (RNNLMs) has allowed reaching significant improvements in Automatic Speech Recognition (ASR) tasks. However, to take advantage of their capability for considering long histories, they are usually used to rescore the N-best lists (i.e. it is in practice not possible to use them directly during acoustic trellis search). We propose in this pape...

متن کامل

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition

Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...

متن کامل

Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine

Effectively exploiting the resources available on modern multicore and manycore processors for tasks such as large vocabulary continuous speech recognition (LVCSR) is far from trivial. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic...

متن کامل

Fuzzy class rescoring: a part-of-speech language model

Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...

متن کامل

Exploiting repair context in interactive error recovery

In current speech applications, facilities to correct recognition errors are limited to either choosing among alternative hypotheses (either by voice or by mouseclick) or respeaking. Information from the context a repair is ignored. We developed a method which improves the accuracy of correcting speech recognition errors interactively by taking into account the context of the repair interaction...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Hierarchical cluster language modeling with statistical rule extraction for rescoring n-best hypotheses during speech decoding

نویسندگان

چکیده

منابع مشابه

Direct word graph rescoring using a* search and RNNLM

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition

Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine

Fuzzy class rescoring: a part-of-speech language model

Exploiting repair context in interactive error recovery

عنوان ژورنال:

اشتراک گذاری